Arrangement and a method for handling failures in a network

ABSTRACT

The present invention relates to an arrangement comprising an Ethernet carrier network, managed by a management system ( 100 ) and comprising a number of levels of management domains, each comprising a number of edge nodes ( 10, 20, 30, 40 ) and a number of intermediate nodes ( 1, 2, 3, 4, 5, 6, 7, 8 ). Each edge node ( 10, 20, 30, 40 ) comprises fault detection of connectivity check messages and fault locating means with holding means ( 11, 21, 31, 41 ) adapted to hold path information for the paths to all other edge nodes. A fault detection means of an edge node having detected a fault is adapted to activate edge node fault locating means to locate the fault using the patch information in the edge node holding means to find a first target intermediate node substantially in the middle of the path containing the fault, and to, directly, send a first loop back message to said intermediate node. A response from said intermediate node indicates that the fault is located at the distant portion of the path, no response indicates that the fault is located at the proximate portion of the path. Further loop back messages are sent to consecutively identified intermediate nodes of the respective path portion identified to contain the fault by the respective preceding loop back message until the fault is localized.

TECHNICAL FIELD

The present invention relates to fault handling in Ethernet networkwhich is managed by a management system and which comprises a number oflevels of management domains.

STATE OF THE ART

Ethernet is one of the most important Local Area Network (LAN)technologies. Actually it is the leading LAN technology. This is mainlydue to a plurality of intrinsic characteristics of the technology whichmakes it simple, cheap and easy to manage and in addition theretobackwards compatible. Operators and carriers are considering thepossibility of reaping the benefits of Ethernet by replacing SONET/SDH&Synchronous Digital Hierarchy) infrastructures with Ethernetinfrastructures since data services account for the bulk of traffic.However, backbone networks and metro networks have requirements whichdiffer quite a lot from those of enterprise LANs. Consequently adeployment of Ethernet technology would require specific enhancementsand several modifications in order to be able to fulfil suchcarrier-grade requirements. Important modifications that sire needed tomake an Ethernet network carrier-grade among other things include a fastfailure localization and also recovery mechanism which is essential inorder to support real time services.

In several network management systems SNMP (Simple Network ManagementProtocol) is used tor detecting or locating link or node failures in thenetwork. When a link or node failure occurs, the affected network devicewill send an SNMP trap to the SNMP server in the Network ManagementSystem (NMS) as an. alarm. The NMS can then take responsive actions forrecovery, for example set up an alternative path, etc.

IEEE has specified a standard, 802.1ag “Connectivity Fault Management”,addressing Ethernet Connectivity Fault Management (CFM). It provides acomplete architecture with functional blocks and a protocol set. In thestandard the end-to-end network connection is splitted into differentlevels of management domains, or so called Maintenance Associations(MA). Within each MA level, a number of Maintenance End Points (MEP) andMaintenance Intermediate Points (MIP) are allocated. Within each MAlevel the procedure of failure detection and localization briefly takesplace as follows: MEPs send. Connectivity Check Messages (CCMs)periodically as heart-beating messages based on a pre-defined timeinterval; an MEP raises an alarm to NMS if no CCM is received within atime period of three times the pre-defined time interval; the MEP sendsLink Trace messages to obtain address information of MIPs; an MEP sendsLoop Back messages to corresponding MIPs to locate a failure.Subsequently a Spanning-Tree Protocol (STP) or a Rapid Spanning TreeProtocol (RSTP) is used to re-calculate the path.

CFM also offers alternative mechanisms for fault detection andlocalization including CFM with the extensions as suggested in ITU-TY.1731 “OAM FUNCTIONS AND MECHANISMS FOR ETHERNET BASED NETWORKS” maygenerate an AIS (Alarm Indication Signal) notifying the egress or NMS ofthe fault and its location, or deployment of link level MEPs whichdetect errors and notify NMS or, as with SNMP, generate AIS for higherlayer MAs.

It is also possible to use GMPLS (Generalised MPLS) and LMP for purposesof fault localization.

It is however a disadvantage that SNMP based solutions are extremelyslow which makes them unsuitable for carrier Ethernet architectures.

802.1ag is based on layer 2 technology and it can be made fairly fast.However, it suffers from disadvantages and limitations which make theperformance non-optimal. First, 802.1ag assumes that MIPs are invisibleto NMS and. MEPs. Due to the fact that 802.1ag aims at a solution forend-to-end service provisioning in a very general scope it is actually avery flexible solution and therefore it is not optimized as far as rateand performance is concerned. Second, 802.1ag defines the availableinformation a CCM can provide in terms of a connectivity fault, but itdoes not specify who is the recipient of the information and how theinformation shall be used and reacted upon. Further 802.1ag uses STP orRSTP to find out an alternative path when a failure occurs, but STP aswell as RSTP can have a quite slow convergence time. In addition theretooperators may choose to disable STP or RSTP in their networks. Stillfurther CFM with AIS is not applicable to multi-point Ethernet servicewhich is a serious drawback since multi-point Ethernet services areexceedingly important for the provisioning of multi-media applications,such as tor example IPTV. In addition thereto link level MEP can imposea limitation as far as scalability is concerned. Another disadvantage isthat interactions between GMPLS, LMP and CFM are not clear.

Thus, these disadvantages contribute in reducing the possibility andattractivity of using Ethernet as a carrier network, mainly in thatfault detection and path localization is not as fast and efficient asneeded and desirable for several applications.

SUMMARY

It is an object of the present invention to solve one or more or theabove mentioned problems and to provide an arrangement as initiallyreferred to which enables fast detection and localization of faults. Itis also an object of the invention to provide an arrangement enablingeasy and reliable failure (fault) detection and localization. Still,further it is an object to provide a solution which imposes noscalability limitations and which allows applicability to multipointEthernet services. It is most particularly an object of the invention toenable for fast and simple recovery after detection and localization ofa fault. Still further it is an object of the invention to provide anarrangement which does not suffer from a slow convergence time andthrough which the Ethernet network can be made carrier-grade and throughwhich real time services can be delivered in a satisfactory manner.

It is also an object of the invention to provide an edge node asinitially referred to through which one or more of the above mentionedobjects can be achieved. It is still further an object of the inventionto provide a method through which one or more of the above mentionedobjects can be achieved.

According to the present invention a solution is suggested through whichIEEE 802.1ag can be said to be extended such that fast, easy andefficient fault detection and localization can be provided, particularlyor optionally also fault recovery.

An arrangement as initially referred to is therefore provided whichcomprises a number of levels of management domains, each comprising anumber of edge nodes (e.g. maintenance end points) and a number ofintermediate nodes (e.g. maintenance intermediate points). Eachmanagement domain particularly corresponds to a level, or vice versa.According to the invention, each edge node in a maintenance domaincomprises holding means adapted to hold path information for the pathsto ail other edge nodes in the management domain. This path informationcomprises information about all intermediate nodes of the respectivepaths in the maintenance domain. The edge nodes are adapted to detectfaults or failures by sending and receiving connectivity check messages.An edge node having detected a fault is adapted to identify a targetintermediate node substantially in the middle of the path between theedge node and another edge node not responding to a connectivity checkmessage (CCM) (or from which it does not receive a CCM for a given timeperiod), and to directly send a first loop back message to the saididentified target intermediate node. It is further adapted to establishif a response is received from the target intermediate node. Receptionof a response indicates that the fault is located at the distant partthe path connecting to or closer to the other edge node. No responseindicates that the fault is located at the proximate part of the pathconnecting to or being closer to the edge node having detected thefault. The detecting edge node is adapted to repeat the procedure byidentifying a target intermediate node substantially in the middle ofthe portion, distant or proximate, of the path identified to contain thefault by the respective preceding loop back message and identifying therespective “subsequent” fault portion, distant or proximate, containingthe fault until the fault has been localized.

According to the invention an edge node, or a maintenance end point, ofan Ethernet based carrier or access network adapted to be managed by amanagement system is also provided which comprises fault detection meanscomprising sending/receiving connectivity check messages and faultlocalization means. Said fault localization means comprise orcommunicate with holding means for holding path information for all thepaths between the edge node and all other edge nodes in one and the samemanagement domain, including all intermediate nodes of the respectivepaths. The fault localization means are adapted to, at detection of afault on a path, (by the fault detection means) to another edge nodeidentify a target intermediate node substantially in the middle of thepath, to send, directly, a loop back message to said target node, andto, based on whether a response is received from the target node or not,identify if the distant or proximate portion of the path, connecting tothe distant or second edge node or to the first or detecting edge node,is the path portion containing the fault. The fault localization meansare also adapted to, unless the fault is already localized, identify anew target intermediate node substantially in the middle of the pathportion previously identified as containing the fault and to send a loopback message to said new target intermediate node, and to repeat theprocedure of finding a subsequent target intermediate node, identifyingthe fault containing path portion etc. until the fault has been found orlocalized, thus for each step reducing by the path portion that needs tobe examined, i.e. the number of intermediate nodes to which loop backmessages need to be sent.

The invention also provides a method as initially referred to whichcomprises the steps of; implementing fault detection bysending/receiving connectivity check messages from each edge node in amanagement domain to all other edge nodes in the same management domain;at detection, in a first edge node, of a fault on the path between saidfirst edge node and another, second, edge node, fetching informationabout the path containing the intermediate nodes on said path fromholding means in the first edge node; identifying a target intermediatenode substantially in the middle of said path; sending, directly, a loopback message to the identified target intermediate node; identifying afault path portion as the portion of the path containing the fault basedon whether a response is received or not from the identified targetintermediate node such as: if a response is received, the fault pathportion is identified as the distant path portion adjacent to orconnecting to the second edge node, whereas if no response is received,the fault path portion is identified as the path portion adjacent to orconnecting to the first edge node, then, unless the fault localized,identifying a respective subsequent target intermediate middle node anda subsequent fault path portion, step by step, until the fault islocalized. In a particular embodiment a fault is localized when thedistance between a target intermediate node and the first or second edgenode corresponds to one line, i.e. when there are no nodes inbetween theedge nodes closest to the target node and the target node.

It is an advantage of the invention that a fast and simple faultdetection and localization is enabled. It is also an advantage that themessage overhead, i.e. the number of messages that need to be sent inorder to localize a fault, is considerably reduced as compared to knownmethods. It is also an advantage that a fast and efficient faultrecovery is enabled. It is particularly an advantage that an arrangementand a method respectively is provided which make deployment of Ethernetbased carrier or access networks very advantageous and attractive e.g.for operators and users.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention will in the following be more thoroughly described, in anon-limiting manner, and with reference to the accompanying drawings, inwhich:

FIG. 1 is a schematical block diagram of an arrangement according to afirst embodiment of the present invention,

FIG. 2 is a block diagram of the arrangement of FIG. 1 when a fault hasoccurred and a link is broken,

FIG. 3 is a schematical block diagram of an arrangement according toanother embodiment of the invention indicating the sending of messageswithin the arrangement,

FIG. 4 is a schematical block diagram of an edge node according to anembodiment of the present invention,

FIG. 5 is a schematical flow diagram describing fault detection andlocalization according to the invention,

FIG. 6 is a schematical flow diagram describing the procedure in an edgenode according to a particular embodiment, for fault detection,localization and recovery, and

FIG. 7 is a more detailed flow diagram describing a fault detection anda fault localization procedure according to the present invention.

DETAILED DESCRIPTION OF THE INVENTION

In a particular embodiment, cf. FIG. 1, FIG. 2 of the invention an edgenode 10 is adapted to report fault information of a localized fault tothe network management system 100. However, preferably it is notdependent on any reaction on behalf of the management system, butcapable to handle temporary recovery itself or independently of thenetwork management system.

Optionally an edge node 10; 20; 30; 40; 50; 60 is adapted to handledetection and localization of faults on nodes and links. Particularly itis capable of distinguishing between node faults and link faults(failures). Alternatively it is merely capable of identifying a fault,the fault being either on a node or on a link connecting to the node.

In an advantageous implementation connectivity check messages (CCM) areimplemented for the detection of faults. In one embodiment this e.g.means that an edge node 10; 20; 30; 40; 50; 60 is adapted toperiodically send such CCM messages to all other edge nodes or endpoints and detecting that there is a fault if there is no response tosuch a CCM message from another edge node, i.e. if response is receivedwithin given time period.

In another advantageous embodiment reception of CCM messages from otheredge nodes is monitored and it is detected if no CCM messages arereceived from a particular edge node from which reception should takeplace with a given periodicity. However, the invention also covers otherconcepts where the actual detection or faults takes place in any otherappropriate manner.

Optionally all edge nodes 10; . . . ;60 or end points comprise holdingmeans 11; . . . ;61, for example respective databases, for holding pathinformation about the paths to all other edge nodes or end points in themanagement domain. This path information includes information about allintermediate nodes of each path, particularly address information ofsuch intermediate nodes. In one particular implementation the pathinformation in an edge node database is received from a managementsystem 100, for example pushed to it. The information can also becollected in other manners.

Preferably, as referred above, an edge node 10; . . . ;60 is adapted to,independently of any management system 100, initiate and perform a faultrecovery procedure. The edge node 10; . . . ;60 is particularly adaptedto be managed by a management system 100 holding address information ofeach intermediate node on each path between respective pairs of edgenodes in a management domain.

Even more particularly the edge node/the arrangement is adapted to bemanaged by a management system 100 holding physical topology informationand active logical topology information for each service provided in themaintenance or management domain, which particularly is handled by asingle operator.

Optionally an edge node 10; . . . ;60 is particularly adapted to, atdetection and localization of a failure or a fault, initiate set up ofan alternative path, particularly by means of fault recovery means 16,cf. FIG. 4. Such an alternative path may comprise a staticallyconfigured back-up path. Alternatively it comprises a dynamicallyconfigured path. The edge node 10; . . . ;60 is particularly adapted toreceive connectivity check messages over a set up alternative path andto send out link tracing messages to find information about theintermediate nodes of such an alternative path. Even more particularlyit is adapted to update its database 11; . . . ;61 holding pathinformation upon reception of response messages from the intermediatenodes 1; . . . ;8 to said link tracing messages. It is preferably alsoadapted to send such updated information to the managing system so thatit can update the topology information held therein.

In a preferred implementation, for fault localization purposes, anoriginating or first edge node 10; . . . ;60 (having detected a fault ona path) is adapted to send loop back messages to identified targetintermediate nodes such that, depending on whether a response is or notfrom such intermediate node, the portion of the path containing thefault consecutively is reduced by a given percentage, for example about50% if a node substantially in the middle of the path is identified astarget intermediate node until the fault is localized. This means thatthe path portions identified as the path portions containing the faultin consecutive steps are divided into two portions, one of whichcontaining the fault, which means that far fewer loop back messages haveto be sent out compared to known methods.

In an advantageous implementation an edge node 10; . . . ;60 is adaptedto, at detection of a fault on a path to another edge node, send loopback messages and await responses from said other or second edge node byconsecutively dividing the path into a fault containing portion and anon-fault containing portion and sending consecutive loop back messagesto identified target intermediate nodes in the respective identifiedfault path portions until no response is obtained from a targetintermediate node, and a response having been received from a node, anintermediate node or an end point, adjacent to said target node on thepath indicating the node providing no response as the node containingthe fault itself or the link between said node and the adjacent nodehaving provided a response.

Particularly the path information in each edge node 10; . . . ; 60comprises a node sequence with intermediate node addresses provided inorder between two end points.

Optionally also an edge node 10; . . . ;60 is provided having one ormore of the above mentioned optional or advantageous features discussedwith reference to the arrangement. Still further a method as referred toearlier in the document is provided which comprises one or more of theoptional features discussed above with reference to the arrangement, butwith the corresponding method steps.

The present invention is particularly focused on the management domainor maintenance association (MA) level of a single network operatordomain. This can be done due to the fact that usually, or mostappropriately, the network operator should take the responsibility ofnetwork failure recovery, rather than service providers or endcustomers. In order to make an arrangement or a method according to thepresent invention work optimally, it is assumed that the managementsystem. NMS 100 managing the arrangement is aware of the addresses ofthe intermediate nodes, particularly MIPs 1; . . . ;8. This canparticularly be justified since a network operator knows details aboutevery node in the operators own network. It is also supposed that theNMS maintains both physical topology and active logical topology foreach service provisioned in the MA. Still further, in the active logicaltopology, for each edge node or maintenance end point of a given MA, theNMS pushes the path information concerning the paths from the particularMSP to all other MEPs, to the database of the MEP in question. Thismeans that each MEP will know, in an MA, how to reach all other MEPs andthrough which MIPs, or more generally intermediate nodes.

Basically, optionally, when one or more edge nodes 10; . . . ;60 detectfailures in the network, instead of sending link trace massages toretrieve the addresses of MIPs or intermediate nodes, loop back messagesare sent directly to target nodes in an appropriate manner in order toenable fast and easy localization of the fault. This is mainly madepossible through the fact that each edge node maintains path informationfor the paths to all other edge nodes in the given management domain ormaintenance association.

Preferably, in embodiments supporting fault recovery, substantially atthe same time, an alternative path is set up for the fault containingpath portion. In a particularly advantageous implementation a staticallyconfigured GMPLS backup path is used. This is advantageous since such apath can be set up extremely fast and in a simple manner. It shouldhowever be pointed out that the inventive concept is by no means limitedto any specific mechanism for set up of alternative paths; on thecontrary this can be done in many different ways once the fault has beendetected and localized, which is the emphasis of the present invention.

In preferred implementations, once a fault has been localized, the edgenode 10; . . . ;60 having localized it sends out a fault report or anerror message to the NMS. Once the alternative path has been set up, theaffected edge node will receive CCM messages again. Then it will sendout link trace messages to find the new alternative path to the otheredge nodes, i.e. establish the addresses of the intermediate nodes 1; .. . ;8, and update its database accordingly 11; . . . ;61. It alsoprovides the NMS with the updated path information whereupon the NMS mayverify the change and update its own topology map accordingly. Then NMScan take any appropriate action to handle the network failure whichhowever will not be described herein, for example covering sending outstaff to manually deal with the fault or any other appropriate solutionin a more or less automatic manner.

Optionally preferably the IEEE 802.1ag mechanism for fault detection isimplemented comprising sending of connectivity check messages. However,instead of strictly following the work flow as suggested by IEEE802.1ag, where LT messages are first sent to locate intermediate nodesbefore any LB messages are send out, the edge node 10; . . . ;60 sendsout LB messages directly as soon as it has detected a connectivity fault(and identified a target node). This is facilitated if the domain is asingle network operator domain or similar.

A basic feature of the inventive concept deals with how an edge nodesends LB messages. For a given path between a pair of edge nodes,instead of sending out a loop back message to the closest intermediatenode first and then to the second closest intermediate node etc., anedge node 10; . . . ;60 will first send a message to an intermediatenode 1; . . .;8 located. substantially in the middle of the path betweenthe two edge nodes. If this intermediate node actually responds, thismeans that the failure is located at the further or distant half of thepath departing from the sending edge node. Otherwise the fault is at thecloser or proximate half, i.e. in the portion of the path located closerto the sending edge node.

This means that the path will be divided substantially into twoportions, one with a problem, i.e. the failure, and one without aproblem. For the part that contains no fault, no further LB messages areneeded. Instead, the sending edge node will focus on the problematicportion of the path. Thus, the path needed to be searched will, aftersending of the first LB message, be shortened by approximately fiftypercent.

The sending edge node 10; . . . , 60 will continue the procedure for theremaining fault containing portion and so on until the fault is actuallylocalized. This will greatly speed up the process of locating faults. Itshould be clear that it does not have to be precisely the middleintermediate node, the main thing being that the path having to beexamined will be reduced to a considerable extent in consecutive steps.

If it is well known that on a particular portion of the path there aremany problems, some other target intermediate node could be selecteddepending on that particular knowledge. It should also be clear that itdoes not have to be the intermediate node right in the middle; forexample if there is an even number of intermediate nodes, any of the“middle” or central nodes can be chosen.

A concept is suggested where each edge node contains path information asdiscussed earlier, and wherein selection of target intermediate nodes iscontrollable. It may for example be possible to alter selectionprocedure depending on actual path.

FIG. 1 shows an example of an MA managed by an NMS 100 and comprisingfour edge nodes, E1 10, E2 20, E3 30 and E4 40. It also comprises anumber of intermediate nodes I1 1, I2 2, I3 3, I4 4, I5 5, I6 6, I7 7and I8 8. Solid lines here indicate active links whereas dashed linesare available links which however are disabled to avoid loops in thenetwork.

Each edge node E1, E2, E3, E4 may have a connection to the NMS 100 asillustrated by dashed-dotted lines, although the inventive concept isnot limited to the provisioning of such connections.

Each edge node further comprises a respective database, DB 11, DB 21, DB31, DB 41 containing, for each respective edge node, a list of path,information for the paths to all other edge nodes in the MA in the formof respective node sequences containing information about the respectiveintermediate nodes. Database 11 of E1 10 in one embodiment contains thefollowing path information, for the paths between. E1 and E2, E1 and E3,E1 and E4 respectively:

P12=(E1, I1, I2, I3, I8, E2)

P13=(E1, I1, I2, I3, I4, I6, E3)

P14=(E1, I1, I2, I3, I4, I5, E4)

The databases of the other edge nodes E2, E3, E4 contain correspondinglists.

In FIG. 2 it is supposed that the link between the I3 and I4 is broken.It is also supposed that E1 10 is the edge node that first detects thisfault. As discussed above, in its DB 11, E1 10 contains informationabout the addresses to all the intermediate nodes I1-I8. According tothe invention E1 10 will start to send an LB immediately to acontrollably selected intermediate node and preferably at the same timereport the occurrence of a fault to the NMS 100. In the DB 11 of E1 10,P13 is, as discussed above the path between E1 and E3 and involves nodesE1, I1, I2, I3, I4, I6, E3.

In the above discussed state of the art solution, E1 would have sent thefirst LB to I1 and after reception of a response from I1 it would sendan LB to I2, and after reception of a response, an LB to I3 etc.

According to the present invention however, E1 first establishes whichis the target intermediate node, here substantially the middle nodewhich is I3 on the path P13. Therefore E1 sends the first LB (LB₁) toI3. After reception of a response from I3, since the path from I1 to I3is unbroken, I1 will be aware of the fact that the problem is located onthe further or distant half of the path seen from E1. This means that E1will not need to send any LB messages to I1 or I2. Instead E1 willidentity a new target intermediate node as I4 and send a second LBmessage LB₂ to I4 since I4 is substantially the middle node between I3and E3. This time, however, E1 will receive no response. Then E1 10knows that the problem is located on the link between I3 and I4 or onthe node I4 (the first and second target nodes being adjacent nodes,with no node inbetween). Thus, in a first step E1 10 identifies I3 asbeing the first target or middle intermediate node I_(M1) and thedistant path portion is identified as the fault portion. Then the secondtarget or middle intermediate node will be identified, here I4 (I_(M2)).

In the figure LB_(IR) denotes the response from I3 to loop back messageLB₁ to I3 whereas from I4 or the second target intermediate node thereis no response to LB₂. According to the present invention an edge nodedetecting a fault can implement an algorithm as follows:

It is supposed that an MA is composed of a set of edge nodes, a set ofintermediate nodes and a set of links connecting the respective edgenodes via intermediate nodes. The set of edge nodes is here denoted Eand each edge node is denoted e₁ wherein E={e₁, e₂, . . . ,e_(m)}. Theintermediate node set is denoted I_(j), the intermediate nodes of theset being denoted: I={i₁, i₂, . . . , i_(n)}.

Each edge node will maintain path information to all other edge nodes inthe same MA. The path information is preferably in the format of a nodesequence where start and end points are members of E whereas all otherpoints are members of I. As an example P_(ij) denotes path informationfrom e_(i) to e_(j), and may have the format: P_(ij)={e_(i), i₁, i₂, . .. , i_(k), e_(j)). For any given edge node e_(i) the fault localizationprocedure can be formulized as described below in FIG. 7.

FIG. 3 shows an embodiment in which it is supposed that a fault isdetected by an edge node E5 50 on a path between E5 50 and another edgenode E6 60. The edge nodes E5 and E6 may be in communication with an NMSas discussed above. This figure merely describes the fault detection andfault localization.

The path between E5 50 and E6 60 goes via intermediate nodes I1′, I2′,I3′, I4′, I5′, I6′. It is supposed that a fault has occurred between E550 and I1′. After locating a fault on the path, it is supposed that E550 identifies a first target intermediate node substantially in themiddle of the path, called I′_(M1), as I4′. E5 50 then sends a firstloop back message LB₁ to I4′, I. However, I4′ does not respond. Thismeans that E5 50 will know that it is the proximate or closer pathportion that has a problem. Therefore E5 50 has to identify a new targetintermediate node. It identifies I2′ as a second target intermediatenode, I′_(M2). However, after sending a second loop back message LB₂(II) to I2′, there is also in this case no response to E5 50. E5 50 willestablish that in this case the proximate closer path portion (of thefirst identified fault path portion) is the new path portion with aproblem and identifies a new target intermediate node, in this case I1′(I′_(M3)). E5 50 then sends LB₃ to I1′, III. As in this case there isalso no response, E5 50 becomes aware of the fact that the fault must beon the link between E5 and I1′ or on I1′ itself. (I′_(M2) and I′_(M3)are adjacent nodes.) E5 50 then sends a fault report, IV, to NMS anddefines an alternative path, V, using the information in DB 51, and/or areconfigured alternative path is activated. E5 then starts receiving CCMmessages from E6 60 again, VI. Then E5 50 starts sending out LT (LinkTrace) messages to new intermediate nodes, or to the nodes on thealternative path, VII, and receive responses from I9′ and I2′, VIII,VIII, with their respective addresses. Then E5 50 (not shown in thefigure) updates its database 51 with the new alternative path (new nodesand modified sequence) and the address information of the newintermediate nodes, and preferably also forwards information to NMS.

FIG. 4 shows an exemplary embodiment of an edge node E1 10 according tothe present invention. Edge node E1 10 comprises fault detection means12 and fault locating means 13 communicating with database 11.Alternatively the fault locating means 13 do actually comprise database11. Edge node E1 10 also, in this embodiment, comprises fault recoverymeans 16. Fault detection means 12 here comprises CCM transmitting/receiving means 12 ₁ for generating and sending CCM messages to otheredge nodes and receiving CCM messages from other edge nodes. It is alsocomprises a response handler 12 ₂ in communication with a timer T1 12 ₄of detection control means 12 ₃ adapted to determine whether CCMmessages are received within a predetermined time interval.Alternatively it is established if a response to a sent CCM is receivedwithin a given time interval. If not, a fault has occurred. Faultreporting means 12 ₅ may be provided to activate sending of a fault orerror report to NMS 100.

The detection control means 12 ₃ are adapted provide information tofault localizing means 13, particularly to LB message handling means 14.The LB message handling means 14 comprise target identifying means 14 ₁for identifying respective target, particularly middle, intermediatenodes and communicating with LB message generating means 14 ₂ adapted tosend LB messages to respective identified target intermediate nodes. LBmessage handling means 14 also comprises a response handler 14 ₃ adaptedto wait for and receive LB responses and to provide information to faultpath identifying means 14 ₄ as to whether a response is received or notwithin a given time period such that the portion of the path containinga fault can be identified in order to identify the subsequent targetintermediate node etc. (unless the fault was located already). When afault has been located, information thereon is provided to fault controlmeans 15 ₁ which provide information to fault reporting means 15 ₂ whichin turn provide information to NMS 100. The fault control means 15 ₁also provide information to fault recovery means 16 comprising a pathswitch 17 and an LT handler 18. For example using information aboutfixed configured alternative paths, or in another embodimentestablishing a new alternative path, a switch can be done to thealternative path (or back-up path). The LT handler 18 is incommunication with the path switch 17 and communicates with DB 11 forsending LT messages on the new alternative path. New intermediate nodesrespond to the sent LT messages with address information and LT handler18 provides for updates to the DB with the addresses information of thenew intermediate nodes of the new node sequence to DB 11, and preferablyalso to NMS 100.

It should be clear that the fault detection means, the fault locationmeans and the fault recovery means can be implemented in many differentways, this merely showing one particular example. It should also beclear that the fault reporting means 12 ₅ of the fault detection means12 are optional and could be constituted by the same means as faultreporting means 15 ₂ and the fault location or localizing means 13. Thedetection, location, and optionally recovery means can also be seen ascomprised by one single means capable of performing the differentfunctionalities detecting, locating and possibly recovering from fault.

FIG. 5 is a simplified flow diagram describing one embodiment of thepresent invention as an overview of fault detecting and fault localizingprocedures in a particular implementation. It is supposed that each edgenode E₀-E_(p) in an MA sends CCM messages to all other edge nodes in theMA along given paths, wherein path information is held in respectiveedge node databases, 100. It is here supposed that E_(r) does notreceive any CCM (or a CCM response) from E_(p) during (or within) agiven time interval, 101. E_(r) then fetches address information forintermediate nodes I along the path to E_(p) from its internal DB, 102.Subsequently E_(r) identifies a first target intermediate node (I_(M1))substantially half-ways or in the middle of the path, 103. Then E_(r)sends a loop back message to I_(M1), 104. E_(r) monitors if a responseis received from I_(M1) within a predetermined time interval, 105. Ifyes, E_(r) identifies (the first) fault containing path portion as thedistant path portion, connecting to or closer to E_(p) than to E_(r),106A. Unless the fault is localized, here if the target node is adjacentto one of the edge nodes, E_(r) itself or E_(p), then E_(r) identifies asecond target intermediate node, I_(M2), substantially in the middle ofthe said distant (fault containing) path portion and sends an LB messageto I_(M2) (unless the path length from I_(M1) to I_(M2)=1, in which casethe fault has been localized), 107A. If the second LB message is sent toI_(M2), it is awaited to see if there is a response from I_(M2), 108A,similar to the step 105 above, and the procedure continues until thefault has been localized.

On the other hand, if in step 105 above, E_(r) identifies the fault pathportion as the proximate path portion, connecting to or to closer toE_(r), 106B, E_(r) identifies a second target intermediate node, hereI_(M2)′, substantially in the middle of the said proximate path portionand sends a second LB message to I_(M2)′ (unless the path length toI′_(M2)=1, in which case a fault has been localized), 107B. Similar tostep 105 above, a response is awaited from I′_(M2), 108B etc. and theprocedure continues until the fault has been localized as discussedabove.

FIG. 6 is a flow diagram schematically describing a possible procedurein an edge node. It is supposed that edge node E1 detects that no CCM isreceived from E5 during a given interval, 200. E1 then fetches pathinformation from DB concerning the path between E1 and E5 includingaddress information of all intermediate nodes, 201. E1 then sends an LBmessage to the first identified target I-node, here called I_(M1), 202.Depending on whether a response is received or not from I_(M1) within apredetermined time period, a fault containing portion of the path isidentified in E1, 203. The procedure of finding the respective targetintermediate nodes and subsequent identifying step of finding faultcontaining path portions proceeds until the fault is localized, 204 andthen the fault is reported to NMS, 205A, and substantiallysimultaneously, or in any arbitrary order, an alternative path to E5 isestablished and activated, e.g. using information in the DB of E1 or apredefined alternative path, 206.

In a particular embodiment the information in DB is not needed; it issufficient with information about the location of the fault andpredefined alternative paths are fixedly configured. Reception of CCMmessages from E5 (on the alternative path) is reassumed, 207. E1 thensends LT messages to E5 on the new path, 208. New intermediate nodesrespond with address information to E1, 209, whereupon E1 updates itsdatabase with the new path information, 210, comprising addressinformation of the new intermediate nodes, 210, and E1 preferably alsoprovides the updates to NMS, 211.

FIG. 7 is a flow diagram describing one example of a fault localizationprocedure that can be implemented in an edge node. It is supposed thatan edge node sends/receives and monitors reception of CCM messages, 300.It is established if any CCMs are received during a given time interval,for example 3× as defined in IEEE 802.1ag as discussed above, 301. Ifyes, it proceeds sending and receiving CCM 300. If not, for each pathP_(ij) wherein j=1, 2, . . . , i−1, i+1, . . . ,m in the path, 302, themiddle intermediate MIP node I_(m) along the path sequence isidentified, 303. An LB message is then sent to I_(m), 304. Then it isestablished if a response is received within a given time period, 305.If yes, it is established if all nodes on the relevant path have beentried, 306. If not, the start point of the path sequence is changed toI_(M), 306B, and it is proceeded with step 303 above and a newintermediate node is identified etc. If yes, it is established if J=m,307. If not, it is proceeded with step 303 above for the next J. If not,an unexpected error is reported, 308, and the procedure ends. If,however, in step 305 above, no response was received, it is establishedif the path link is equal to 1, 306A. If yes, the fault is located,307A₂, and the procedure ends. If the path link is not=1, the end pointof the path sequence is changed to be I_(m), 307A₁, and it is proceededwith step 303 above for the next J.

As far as the fault recovery procedure is concerned, when a fault hasbeen localized, the edge node will report the fault information to NMS,for example through SNMP or in any other manner. It should however beclear that the fault recovery process does not rely on any reaction onbehalf of NMS upon this fault message. If a GMPLS backup path isstatically configured in the network, it can automatically be switchedto the backup path. It should be clear that also in other mannersalternative backup paths can be configured fixedly or dynamically, or inany appropriate manner.

When an alternative or backup path has been activated, the edge nodesending LB messages will start to receive CCM messages again. It willthen send out LT messages to the edge node along the path on which itdetected a fault. In this manner new intermediate nodes in the newlysetup path will respond with their addresses and the sending edge nodefinds out the new path sequence to the affected edge node and updatesits path accordingly as discussed above. Subsequently the sending edgenode will send the update to NMS, which in turn preferably verifies ifthe update is in order by checking the network status. If the update isin order, the NMS will also update its active topology map for the givenMA. Finally the NMS takes appropriate actions to handle the fault. Itmay for example be done by manual interaction or in any appropriatemanner.

It is an advantage of the invention that this solution is based on the802.1ag standard and that preferably message switch types and protocolsare taken from the standard which can be implemented directly inproducts. This makes it very attractive for product development. It isalso an advantage that through the inventive concept, complexity is keptat the edges of each maintenance domain and it provides a goodscalability. Actually, the larger the network, the more significant arethe benefits of the inventive concept.

In a particular embodiment a protection switching mechanism has setupback up paths in advance for critical parts of then network, i.e.network segments where heavy aggregation occurs. This means that when alink failure occurs, the network does not need to re-calculate analternative path, instead it can use a pre-defined backup path directly.In that manner stringent fail-over time requirements can be met. When abackup path has started to function normally, the broken link can berepaired and re-configured, but this is not time-critical and can bedone through NMS or in any other appropriate manner. Through theimplementation of the inventive concept, dynamic path re-calculationmechanisms such as for example Spanning Tree Protocol and variantsthereof can be disabled in the network, although the invention is notrestricted thereto.

It should be clear that the invention is not limited to the specificallyillustrated embodiments, but that it can be varied in a number of wayswithin the scope of the appended claims.

The invention claimed is:
 1. A method for detecting and localizing afailure or fault in an Ethernet based carrier or access network managedby a management system and comprising a number of management domains ondifferent levels, each with a number of end point or edge nodes and anumber of intermediate nodes, the method comprising the steps of:implementing fault detection by sending localization or connectivitycheck messages from each edge node to all other edge nodes in themanagement domain, and in each edge node monitoring detection of suchmessages from all other edge nodes; at detection, in a first edge node,of a fault on the path between said first edge node and a second edgenode: fetching information about the path, containing the intermediatenodes on the path, from holding means in or communicating with the firstedge node; identifying a target intermediate node substantially in themiddle of the path containing the fault; sending, directly, a loop backmessage to the identified target intermediate node; identifying a faultpath portion as the portion of the path containing the fault based onwhether a response is received or not from the identified targetintermediate node, wherein: if a response is received, the fault pathportion is identified as the distant path portion, connecting to thesecond edge node, whereas if no response is received, the fault pathportion is identified as the path portion connecting to the first edgenode; then, until the fault is localized, identifying a respective newtarget intermediate node and a new fault containing path portion, stepby step, until the fault is localized.
 2. The method according to claim1, further comprising the step of reporting, from the first edge node,fault information concerning a detected and/or localized fault to themanagement system.
 3. The method according to claim 1, furthercomprising the steps of: periodically sending, from an edge node,localization or connectivity check messages to all other edge nodes;monitoring reception of responses to connectivity check messages and/orreception of connectivity check messages from all other edge nodes; and,identifying a fault if no response is received within a predeterminedtime period and/or if expected connecting check message are not receivedwith a given periodicity from an edge node.
 4. The method according toclaim 1, further comprising the step of, in each edge node, receivingand holding information about paths to all other edge nodes includingall intermediate nodes on the respective paths.
 5. The method accordingto claim 1, further comprising the steps of: switching, automatically,to and activating an alternative or back up path in a first edge nodewhen a fault has been localized; receiving connectivity check messagesover the alternative or back up path; sending link trace messages to thesecond edge node; receiving response messages with respective addressinformation from all, or all new, intermediate nodes on the alternativeor back-up path; updating, in the first edge node, the database with thenew alternative path sequence and intermediate node address information;and, sending, from the first edge node, the updated new alternative pathinformation to the management system.
 6. The method according to claim1, wherein the management domain is operated by a common or singleoperator.
 7. The method according to claim 1, further comprising thesteps of: keeping information about all the addresses of allintermediate nodes in the management system; holding information, in themanagement system, about the physical topology and about the activelogical topology for each service provisioned in the management domain;pusing to each edge node database, of a given management domain, fromthe management system, path information concerning the paths between therespective edge node and all other edge nodes; and, updating the activetopology information in the management system with received updates fromedge nodes.